🔤 Character Classification

Discussed on Lobsters

🌏Character Sets GitHub·

Show HN: Topaz – A small Unicode-first language that compiles to Rust

Covered by ta.fo Journal

Discussed on Hacker News and Hacker News

🪟Awesome windows command-line E.Z. Hart·

The Restraint Not To Fix Everything

Covers 2 stories including Internet Archive Service Availability

📄Text Mining therage.co·

The Fed Is Working on a CBDC

Covers Project Agorá shows how tokenisation can improve wholesale cross-border payments; work will advance to real-value testing

Discussed on Hacker News

📄Text Mining arxiv.org·

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

🌏Character Sets GitHub·

I built a 2.3MB Markdown-to-PDF app because Chromium felt absurd

Discussed on Hacker News

🔤Coded character sets arxiv.org·

Pixel-TTS: Image based Text Rendering for Robust Text-to-Speech

🔠Terminal Fonts GitHub·

Data Visualization from the Comfort of Your Terminal

Covers Zipf's Law

Discussed on Hacker News and Lobsters

👁️OCR Verification arxiv.org·

Stringalign: Moving beyond summary statistics with a transparent Unicode-aware tool for evaluating automatic transcription models

📄Text Mining arxiv.org·

TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization

📄Text Mining arxiv.org·

OTRO: Oblivious Tokenization Path with Square-Root ORAM

📄Text Mining arxiv.org·

Do Generative Recommenders Deepen the Information Cocoon? A Closed-Loop Simulation with LLM-powered User Simulators

📄Text Mining arxiv.org·

Equity with Efficiency: An Empirical Study of Tokenizers for Multilingual Large Language Models

📄Text Mining arxiv.org·

Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

🎯Content Recommendation arxiv.org·

Leveraging Code-Mixed Product Metadata and User Feedback for Personalized Recommendation on Daraz Bangladesh

🏛️Philosophy arxiv.org·

Emergent retokenization symmetry in large language models: phenomenology and applications

🏛️Philosophy arxiv.org·

JetParticle-JEPA: An Efficient Self-Supervised Representation Learning method for Jet Tagging in High-Energy Physics

📄Text Mining arxiv.org·

Toward Training-Free Zero-Shot Anomaly Detection in 3D Medical Images: A Batch-Based Approach Using 2D Foundation Models

No more posts from matmat's subscribed feeds.

Scour all 25,324 feeds Learn more about Feeds

Beyond Perplexity: UTF-8 Validity in Byte-aware Language Models

PHP 8.5.7 `mb_substr()` 'SJIS-mac' size_t underflow

all the ways in which terminals' text rendering is bad (2024)

Show HN: Topaz – A small Unicode-first language that compiles to Rust

The Restraint Not To Fix Everything

The Fed Is Working on a CBDC

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

I built a 2.3MB Markdown-to-PDF app because Chromium felt absurd

Pixel-TTS: Image based Text Rendering for Robust Text-to-Speech

Data Visualization from the Comfort of Your Terminal

Stringalign: Moving beyond summary statistics with a transparent Unicode-aware tool for evaluating automatic transcription models

TivTok: Broadcasting Time-Invariant Tokens for Scalable Video Tokenization

OTRO: Oblivious Tokenization Path with Square-Root ORAM

Do Generative Recommenders Deepen the Information Cocoon? A Closed-Loop Simulation with LLM-powered User Simulators

Equity with Efficiency: An Empirical Study of Tokenizers for Multilingual Large Language Models

Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

Leveraging Code-Mixed Product Metadata and User Feedback for Personalized Recommendation on Daraz Bangladesh

Emergent retokenization symmetry in large language models: phenomenology and applications

JetParticle-JEPA: An Efficient Self-Supervised Representation Learning method for Jet Tagging in High-Energy Physics

Toward Training-Free Zero-Shot Anomaly Detection in 3D Medical Images: A Batch-Based Approach Using 2D Foundation Models